Hierarchical Document Clustering Using Correlation Preserving Indexing

نویسنده

  • L. Prabhakar
چکیده

This paper presents a spectral clustering method called as correlation preserving indexing (CPI). This method is performed in the correlation similarity measure space. Correlation preserving indexing explicitly considers the manifold structure embedded in the similarities between the documents. The aim of CPI method is to find an optimal semantic subspace by maximizing the correlation between the documents in the local patches and simultaneously correlation in the patches outside are minimized. Correlation is a similarity measure can capture the intrinsic structure in high dimensional data. In an effort to reduce the computational cost of CPI method, we propose to apply the biiterative least square method to reduce the dimensions. On comparison of the effectiveness of the CPI method with other clustering methods, using the Bi-iterative least square method there has been a considerable reduction in the time computation. Key Terms: Document clustering; Correlation preserving indexing; Singular value decomposition; Dimensionality reduction; Correlation measure; QR decomposition Full Text: http://www.ijcsmc.com/docs/papers/August2013/V2I8201352.pdf

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correlation Preserved Indexing Based Approach For Document Clustering

Document clustering is the act of collecting similar documents into clusters, where similarity is some function on a document. Document clustering method achieves 1) a high accuracy for documents 2) document frequency can be calculated 3) term weight is calculated with the term frequency vector. Document clustering is closely related to the concept of data clustering. Document clustering is a m...

متن کامل

Grouping and Categorization of Documents in Relativity Measure

This paper presents a spectral clustering method called correlation through preserving indexing (CPI), which is to perform in the correlation similarity measure space. The documents are considered into a low dimensional semantic space, the correlations between the documents in the local patches are maximized and correlations between the documents outside these patches are minimized. The intrins...

متن کامل

Non-hierarchical Clustering with Rival Penalized Competitive Learning for Information Retrieval

In large content-based image database applications, e cient information retrieval depends heavily on good indexing structures of the extracted features. While indexing techniques for text retrieval are well understood, e cient and robust indexing methodology for image retrieval is still in its infancy. In this paper, we present a non-hierarchical clustering scheme for index generation using the...

متن کامل

Context Based Indexing On Synonym System Using Hierarchical Clustering In Web Mining

Now a days, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective indexing structure. Indexing in search engines has become the major issue for improving the performance of Web search engines, so that the most relevant web documents are retrieved in minimum possib...

متن کامل

Effects of Visual Concept-based Post-retrieval Clustering in ImageCLEFphoto 2008

We examined the effectiveness of post-retrieval clustering that was based on the visual similarities among images to enhance the instance recall in the photo retrieval task of ImageCLEF 2008. The visual similarities are defined by the example visual concepts that were provided for the automatic photo indexing task. We tested two types of visual concepts and two kinds of clustering methods, hier...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013